-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and enhance pipeline with batch processing #607
base: main
Are you sure you want to change the base?
Conversation
* Implement batch processing for node updates and refactor IP address handling * Remove history IP table * Remove unused function * remove history name tables * Remove unused GetMetadata function from NodeMetadata * Keep remove unused function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR refactors the pipeline by introducing a batch processing mechanism for updating node last seen times and removing legacy historical tracking functions. Key changes include:
- Implementation of a new batch writer (in cmd/tls/handlers/writers.go) to accumulate and flush node update events.
- Updates in multiple handler functions (post.go and handlers.go) to use the new batch writer instead of immediate database calls.
- Removal of unused historical tracking code for IP addresses, hostnames, and localnames in the nodes package.
Reviewed Changes
File | Description |
---|---|
cmd/tls/handlers/writers.go | Introduces the batch writer and its background processing logic |
cmd/tls/handlers/handlers.go | Integrates the batch writer into the handlers with hardcoded testing values |
cmd/tls/handlers/post.go | Updates event handling to use the batch writer instead of immediate refresh calls |
pkg/nodes/nodes.go | Removes legacy functions and repurposes RefreshLastSeenBatch to update "last_config" |
pkg/logging/dispatch.go | Removes refresh calls for logging, aligning with the new update strategy |
pkg/nodes/metadata.go, names.go, ipaddress.go | Remove unused historical tracking functionality |
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
cmd/tls/handlers/post.go:421
- [nitpick] In the QueryWriteHandler, utils.GetIP(r) is redundantly retrieved. Reuse the previously retrieved IP to ensure consistency across updates.
ip := utils.GetIP(r)
pkg/nodes/nodes.go:464
- The RefreshLastSeenBatch function updates the 'last_config' column, which may be misleading. Verify if this is the intended behavior or if a dedicated 'last_seen' column should be updated instead.
return n.DB.Model(&OsqueryNode{}).Where("id IN ?", nodeID).UpdateColumn("last_config", time.Now()).Error
log.Err(err).Msg("error refreshing last query read") | ||
} | ||
// Refresh node last seen | ||
ip := utils.GetIP(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] In the QueryReadHandler, utils.GetIP(r) is called a second time even though it was already obtained earlier. Consider capturing the IP address once and reusing it to avoid potential inconsistencies.
ip := utils.GetIP(r) |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
return nil | ||
func (n *NodeManager) RefreshLastSeenBatch(nodeID []uint) error { | ||
|
||
return n.DB.Model(&OsqueryNode{}).Where("id IN ?", nodeID).UpdateColumn("last_config", time.Now()).Error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just have a new field called last_seen
and not to recycle any of the existing ones.
) | ||
|
||
// writeEvent represents a single update request. | ||
type writeEvent struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we have a typed event for each update, or something generic enough that can be used for any updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the two comments about having a new field last_seen
and to have generic events. Thanks for the PR!
This update introduces a new batch processing mechanism for node last seen time udpate, improving efficiency and reducing redundant operations.
It also removed unused functions. Discontinued historical tracking of IP addresses and node names, as there are no clear use cases for this data.
What is missing in this PR: